14,914 research outputs found
Adapting End-to-End Speech Recognition for Readable Subtitles
Automatic speech recognition (ASR) systems are primarily evaluated on
transcription accuracy. However, in some use cases such as subtitling, verbatim
transcription would reduce output readability given limited screen size and
reading time. Therefore, this work focuses on ASR with output compression, a
task challenging for supervised approaches due to the scarcity of training
data. We first investigate a cascaded system, where an unsupervised compression
model is used to post-edit the transcribed speech. We then compare several
methods of end-to-end speech recognition under output length constraints. The
experiments show that with limited data far less than needed for training a
model from scratch, we can adapt a Transformer-based ASR model to incorporate
both transcription and compression capabilities. Furthermore, the best
performance in terms of WER and ROUGE scores is achieved by explicitly modeling
the length constraints within the end-to-end ASR system.Comment: IWSLT 202
Generalized liquid crystals: giant fluctuations and the vestigial chiral order of , and matter
The physics of nematic liquid crystals has been subject of intensive research
since the late 19th century. However, because of the limitations of chemistry
the focus has been centered around uni- and biaxial nematics associated with
constituents bearing a or symmetry respectively. In
view of general symmetries, however, these are singularly special since nematic
order can in principle involve any point group symmetry. Given the progress in
tailoring nano particles with particular shapes and interactions, this vast
family of "generalized nematics" might become accessible in the laboratory.
Little is known since the order parameter theories associated with the highly
symmetric point groups are remarkably complicated, involving tensor order
parameters of high rank. Here we show that the generic features of the
statistical physics of such systems can be studied in a highly flexible and
efficient fashion using a mathematical tool borrowed from high energy physics:
discrete non-Abelian gauge theory. Explicitly, we construct a family of lattice
gauge models encapsulating nematic ordering of general three dimensional point
group symmetries. We find that the most symmetrical "generalized nematics" are
subjected to thermal fluctuations of unprecedented severity. As a result, novel
forms of fluctuation phenomena become possible. In particular, we demonstrate
that a vestigial phase carrying no more than chiral order becomes ubiquitous
departing from high point group symmetry chiral building blocks, such as ,
and symmetric matter.Comment: 14 pages, 5 figures; published versio
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Encoder-decoder models provide a generic architecture for
sequence-to-sequence tasks such as speech recognition and translation. While
offline systems are often evaluated on quality metrics like word error rates
(WER) and BLEU, latency is also a crucial factor in many practical use-cases.
We propose three latency reduction techniques for chunk-based incremental
inference and evaluate their efficiency in terms of accuracy-latency trade-off.
On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by
sacrificing 1% WER (6% rel.) compared to offline transcription. Although our
experiments use the Transformer, the hypothesis selection strategies are
applicable to other encoder-decoder models. To avoid expensive re-computation,
we use a unidirectionally-attending encoder. After an adaptation procedure to
partial sequences, the unidirectional model performs on-par with the original
model. We further show that our approach is also applicable to low-latency
speech translation. On How2 English-Portuguese speech translation, we reduce
latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5%
rel.) compared to the offline system
Spin-dependent Klein tunneling in graphene: Role of Rashba spin-orbit coupling
Within an effective Dirac theory the low-energy dispersions of monolayer
graphene in the presence of Rashba spin-orbit coupling and spin-degenerate
bilayer graphene are described by formally identical expressions. We explore
implications of this correspondence for transport by choosing chiral tunneling
through pn and pnp junctions as a concrete example. A real-space Green's
function formalism based on a tight-binding model is adopted to perform the
ballistic transport calculations, which cover and confirm previous theoretical
results based on the Dirac theory. Chiral tunneling in monolayer graphene in
the presence of Rashba coupling is shown to indeed behave like in bilayer
graphene. Combined effects of a forbidden normal transmission and spin
separation are observed within the single-band n to p transmission regime. The
former comes from real-spin conservation, in analogy with pseudospin
conservation in bilayer graphene, while the latter arises from the intrinsic
spin-Hall mechanism of the Rashba coupling.Comment: 10 pages, 10 figure
- …